AITopics | hausa language

Collaborating Authors

hausa language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dataset Creation and Baseline Models for Sexism Detection in Hausa

Muhammad, Fatima Adam, Hassan, Shamsuddeen Muhammad, Inuwa-Dutse, Isa

arXiv.org Artificial IntelligenceNov-3-2025

Sexism reinforces gender inequality and social exclusion by perpetuating stereotypes, bias, and discriminatory norms. Noting how online platforms enable various forms of sexism to thrive, there is a growing need for effective sexism detection and mitigation strategies. While computational approaches to sexism detection are widespread in high-resource languages, progress remains limited in low-resource languages where limited linguistic resources and cultural differences affect how sexism is expressed and perceived. This study introduces the first Hausa sexism detection dataset, developed through community engagement, qualitative coding, and data augmentation. For cultural nuances and linguistic representation, we conducted a two-stage user study (n=66) involving native speakers to explore how sexism is defined and articulated in everyday discourse. We further experiment with both traditional machine learning classifiers and pre-trained multilingual language models and evaluating the effectiveness few-shot learning in detecting sexism in Hausa. Our findings highlight challenges in capturing cultural nuance, particularly with clarification-seeking and idiomatic expressions, and reveal a tendency for many false positives in such cases.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.27038

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.49)
Public Relations > Community Relations (0.34)

Industry: Law > Civil Rights & Constitutional Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Who Wrote This? Identifying Machine vs Human-Generated Text in Hausa

Sani, Babangida, Soy, Aakansha, Imam, Sukairaj Hafiz, Mustapha, Ahmad, Aliyu, Lukman Jibril, Abdulmumin, Idris, Ahmad, Ibrahim Said, Muhammad, Shamsuddeen Hassan

arXiv.org Artificial IntelligenceMar-17-2025

The advancement of large language models (LLMs) has allowed them to be proficient in various tasks, including content generation. However, their unregulated usage can lead to malicious activities such as plagiarism and generating and spreading fake news, especially for low-resource languages. Most existing machine-generated text detectors are trained on high-resource languages like English, French, etc. In this study, we developed the first large-scale detector that can distinguish between human- and machine-generated content in Hausa. We scrapped seven Hausa-language media outlets for the human-generated text and the Gemini-2.0 flash model to automatically generate the corresponding Hausa-language articles based on the human-generated article headlines. We fine-tuned four pre-trained Afri-centric models (AfriTeVa, AfriBERTa, AfroXLMR, and AfroXLMR-76L) on the resulting dataset and assessed their performance using accuracy and F1-score metrics. AfroXLMR achieved the highest performance with an accuracy of 99.23% and an F1 score of 99.21%, demonstrating its effectiveness for Hausa text detection. Our dataset is made publicly available to enable further research.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.13101

Country:

North America > United States (1.00)
Africa > Nigeria > Jigawa State > Dutse (0.05)
North America > Dominican Republic (0.04)
(3 more...)

Genre: Research Report (0.70)

Industry: Media > News (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detection of Offensive and Threatening Online Content in a Low Resource Language

Adam, Fatima Muhammad, Zandam, Abubakar Yakubu, Inuwa-Dutse, Isa

arXiv.org Artificial IntelligenceNov-17-2023

Hausa is a major Chadic language, spoken by over 100 million people in Africa. However, from a computational linguistic perspective, it is considered a low-resource language, with limited resources to support Natural Language Processing (NLP) tasks. Online platforms often facilitate social interactions that can lead to the use of offensive and threatening language, which can go undetected due to the lack of detection systems designed for Hausa. This study aimed to address this issue by (1) conducting two user studies (n=308) to investigate cyberbullying-related issues, (2) collecting and annotating the first set of offensive and threatening datasets to support relevant downstream tasks in Hausa, (3) developing a detection system to flag offensive and threatening content, and (4) evaluating the detection system and the efficacy of the Google-based translation engine in detecting offensive and threatening terms in Hausa. We found that offensive and threatening content is quite common, particularly when discussing religion and politics. Our detection system was able to detect more than 70% of offensive and threatening content, although many of these were mistranslated by Google's translation engine. We attribute this to the subtle relationship between offensive and threatening content and idiomatic expressions in the Hausa language. We recommend that diverse stakeholders participate in understanding local conventions and demographics in order to develop a more effective detection system. These insights are essential for implementing targeted moderation strategies to create a safe and inclusive online environment.

detection, detection system, hausa language, (14 more...)

arXiv.org Artificial Intelligence

2311.10541

Country:

North America > United States > Texas > Harris County > Houston (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Nigeria > Jigawa State > Dutse (0.05)
(5 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
(2 more...)

Add feedback

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Parida, Shantipriya, Abdulmumin, Idris, Muhammad, Shamsuddeen Hassan, Bose, Aneesh, Kohli, Guneet Singh, Ahmad, Ibrahim Said, Kotwal, Ketan, Sarkar, Sayan Deb, Bojar, Ondřej, Kakudi, Habeebah Adamu

arXiv.org Artificial IntelligenceMay-28-2023

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

artificial intelligence, natural language, question answering, (17 more...)

arXiv.org Artificial Intelligence

2305.1769

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Africa > Nigeria > Jigawa State > Dutse (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(32 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.93)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

hauWE: Hausa Words Embedding for Natural Language Processing

Abdulmumin, Idris, Galadanci, Bashir Shehu

arXiv.org Artificial IntelligenceNov-25-2019

Words embedding (distributed word vector representations) have become an essential component of many natural language processing (NLP) tasks such as machine translation, sentiment analysis, word analogy, named entity recognition and word similarity. Despite this, the only work that provides word vectors for Hausa language is that of Bojanowski et al. [1] trained using fastText, consisting of only a few words vectors. This work presents words embedding models using Word2Vec's Continuous Bag of Words (CBoW) and Skip Gram (SG) models. The models, hauWE (Hausa Words Embedding), are bigger and better than the only previous model, making them more useful in NLP tasks. To compare the models, they were used to predict the 10 most similar words to 30 randomly selected Hausa words. hauWE CBoW's 88.7% and hauWE SG's 79.3% prediction accuracy greatly outperformed Bojanowski et al. [1]'s 22.3%.

dataset, representation, vector, (12 more...)

arXiv.org Artificial Intelligence

1911.10708

Country:

North America > United States (1.00)
Europe > Italy (0.04)
Asia > Pakistan (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)

Add feedback